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Abstract 

Within the frame of an effective, coarse-grained hydrophobic-polar protein model, we employ 
multicanonical Monte Carlo simulations to investigate free-energy landscapes and folding channels 
of exemplified heteropolymer sequences, which are permutations of each other. Despite the sim- 
plicity of the model, the knowledge of the free-energy landscape in dependence of a suitable system 
order parameter enables us to reveal complex folding characteristics known from real bioproteins 
and synthetic peptides, such as two-state folding, folding through weakly stable intermediates, and 
glassy metastability. 

PACS numbers: 05.10.-a, 87.15.Aa, 87.15.Cc 



Folding of linear chains of amino acids, i.e., bioproteins and synthetic peptides, is, for 
single-domain macromolecules, accompanied by the formation of secondary structures (he- 
lices, sheets, turns) and the tertiary hydrophobic-core collapse. While secondary structures 
are typically localized to segments of the peptide, the effective hydrophobic interaction be- 
tween nonbonded, nonpolar amino acid side chains results in a global, cooperative arrange- 
ment favoring folds with compact hydrophobic core and surrounding polar shell screening 
the core from the polar solvent. Systematic analyses for unravelling general folding princi- 
ples are extremely difficult in microscopic all-atom approaches, since the folding process is 
strongly dependent on the "disordered" sequence of amino acids - twenty different types can 
typically occur in bioproteins - and the native-fold formation is inevitably connected with, 
at least, significant parts of the sequence. Moreover, for most proteins, the folding process 
is relatively slow (microseconds to seconds), which is due to a complex, rugged shape of the 
free-energy landscape with "hidden" barriers, depending on sequence properties. 

Although there is no obvious system parameter that allows for a general description of the 
accompanying conformational transitions in folding processes (as, for example, the reaction 
coordinate in chemical reactions), it is known that there are only a few classes of character- 
istic folding behaviors, mainly single-exponential folding, two-state folding, folding through 
intermediates, and glass-like folding into metastable conformations ^, 0, 0, Q, 0, [q]. 

An important step forward towards a better theoretical understanding of the basic mecha- 
nisms underlying these different classes could be the design and analysis of suitably designed 
coarse-grained models focusing on mesoscopic scales. The idea to use a strongly simplified 
model is two-fold: Firstly, it is believed that tertiary folding is mainly based on effective 
hydrophobic interactions such that atomic details play a minor role. Secondly, system- 
atic comparative folding studies for mutated or permuted sequences are computationally 
extremely demanding at the atomic level and are to date virtually impossible for realistic 
proteins. In this Letter, we show that by employing a coarse-grained hydrophobic-polar 
heteropolymer model lOj and monitoring a simple angular "order" parameter it is indeed 
possible to identify different complex folding characteristics. This is comparable to studies 
of phase transitions based on effective order parameters in other disordered systems such 
as, e.g., spin glasses, where simplified models are successfully employed [llj. The individual 
folding trajectories as discussed in this work will be characterized by a similarity parameter 
which is related to the replica overlap parameter used in spin-glass analyses. This is useful 
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as the amino acid sequence induces intrinsic disorder and frustration into the system and 
therefore a peptide behaves similar to a spin system with a quenched disorder configuration 
of couphngs. 



The simphfied mode l | lOl| used incorporates only two types of amino acids, hydropho- 
bic and polar residues 1 

folding, such as hydrophobic-core formation [13|. 



and focuses on qualitative aspects of tertiary heteropolymer 



14 



16 



17j. This physical, effective- 



potential approach has to be distinguished from knowledge-based models - typically of Go 
type - where the contact map of the final fold already enters as input into the model. The 
latter models have proven to be useful in understanding two-state folding of selected pro- 
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23|. On the other hand, the kinetics of effective models is not 



biased towards a given structure, and a variety of folding behaviors can be studied. This 
has particular implications for non-two-state folding and metastability, the latter primary 
concerning designed synthetic peptides or mutated biopolymers. 

Our results are obtained by employing the standard hydrophobic-polar off-lattice AB 
model [l^ in three dimensions for the three sequences listed in Table [11 The sequences were 



chosen from the set of deliberately designed sequences in Ref. 25|] and have the same content 



of hydrophobic A (14 each) and polar B (6 each) residues. In the following, we denote by 
Ti the spatial position of the ith monomer in the chain X = {ri, . . . ,rAr} of residues. 
Covalent bonds have unit length. The bending angle between monomers k, k + 1, and k + 2 
is i)k (0 < "i^fc < tt) and ai = A,B symbolizes the type of the monomer. The energy of a 
conformation is given hj E = -Ebend + Ei^j, where 

^bend = J ~ COS^9fc) (1) 

^ k 

is the bending energy and 



j>i+l 



(2) 



is the contribution of the residue-type dependent Lennard- Jones potential, which depends 
on the distance rij of all pairs of nonbonded monomers i and j, being long-range attractive 
for AA and BE pairs A) = 1, C{B, B) = 0.5] and repulsive for AB pairs of monomers 

[C{A,B) = C{B,A) = —0.5]. Simulations of this model were performed using standard 
multicanonical Monte Carlo techniques [2J] with spherical updates [l7|. For each sequence, 
10 independent simulations were performed and a total statistics of 2 x 10^ conformations 
entered into the data analysis. 



TABLE I: The three AB 20-mers studied in this Letter and the values of the associated (putative) 
global energy minima. Note that the given values for sequence S3 belong to two different, almost 
degenerate folds. 



label 


sequence 


global energy minimum 


SI 


BA(iBA4BA2BA2B2 


-33.8236 


S2 


A4BA2BABA2B2A3BA2 


-34.4892 


S3 


A4B2A4BA2BA3B2A 


-33.5838, -33.5116 



Since the number of degrees of freedom (virtual bond and torsion angles) in the coarse- 
grained model is comparable with the number of dihedral angles in all-atom protein models, 
AB heteropolymer folding is of similar complexity. The main advantage is the drastically 
reduced computational effort for calculating the interactions, which allows more comprehen- 
sive and systematic analyses of free-energy landscapes and folding channels in comparative 
studies for different sequences. In the following, we perform such an analysis of characteris- 
tic folding behaviors based on a suitably defined generalized angular overlap parameter, as 
introduced in Ref. |17j] in analogy to all-atom studies 0]. It is a computationally low-cost 
measure for the similarity of two conformations, where the differences of the angular degrees 
of freedom are calculated. In order to consider this parameter as kind of order parameter, 
it is useful to compare conformations X of the actual ensemble with a suitable reference 
structure X'-^-', which is preferably chosen to be the global-energy minimum conformation. 



The overlap parameter is defined as 
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Q(X) = l-ci(X). (3) 

Denoting by A*";, = — 2 and Nt = N — 3 the numbers of bending angles and torsional 
angles cpi, respectively, the angular deviation between the conformations is calculated ac- 
cording to d{X) = + max (e^I'i 4''^(</^^), 4~^(</'i))] A(Ar, + A^,), where 

db{^i) = Wi — ^f^ and (ip''(v9j) = mm{\Lpi±Lpf'^\,2'K — \Lpi±ipf\). Note that this expression 
takes into account the reflection symmetry —>■ —ipi of the AB model. Reflection-symmetric 
conformations are not distinguished and therefore only the larger overlap is considered. The 
overlap is unity, if all angles coincide, else < Q < 1. The average overlap of a random 
conformation with the reference state is for the three sequences close to {Q) = 0.66 ± 0.02. 
Signiflcant similarity is typically found if Q > 0.8. 




FIG. 1: (Color online) Multicanonical histograms H^^ca.{E,Q) of energy E and angular overlap 
parameter Q and free-energy landscapes F[Q) at different temperatures for the three sequences 
(a) SI, (b) S2, and (c) S3. The reference folds reside at Q = 1 and E = -Emin- Pseudophases are 
symbolized by D (denatured states), N (native folds), I (intermediates), and M (metastable states). 
Representative conformations in intermediate and folded phases are also shown. 
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For the qualitative discussion of the folding characteristics, we consider the multicanoni- 
cal histograms of energy E and angular overlap Q, ifmuca(-E, Q) = Et ^£;,£;(Xt)5Q,Q(Xt), where 
the sum runs over all Monte Carlo sweeps t in the multicanonical simulation, which yields 
a constant energy distribution /imuca(-E') = Jo ^-Q H^^ca.iE,Q) ^ const. In consequence, 
Hmuca.{E, Q) is useful for identifying the folding channels, independently of temperature. 
Restricting the canonical partition function at temperature T to the "microoverlap" ensem- 
ble with overlap Q, Z{Q) = J PX ^(Q — (5(X)) exp{—E(K)/kBT}, where the integral is over 
all possible conformations X, we define the overlap free energy as F{Q) = —ksT In Z{Q). 

Figures [l](a)-(c) show the thus obtained multicanonical histograms i/muca(-E', Q) (left) 
and the overlap free-energy landscapes F{Q) (right) at different temperatures for the three 
sequences listed in Table HI The different branches of H^^ca{E, Q) indicate the channels 
the heteropolymer can follow in the folding process towards the reference structure. The 
heteropolymers, whose sequences differ only by permutations, exhibit noticeable differences 
in the folding behavior towards the native conformations. The first interesting observation 
is that the minimalistic model used is capable of revealing the different folding behaviors of 
the wild-type and permuted sequences. The second remarkable result is that the angular 
overlap parameter Q is a surprisingly manifest measure for the peptide macrostate. 

From Fig. [T](a) we conclude that folding of sequence SI exhibits a typical two-state char- 
acteristics. Above the transition, conformations possess a random-coil-like overlap Q ~ 0.7, 
i.e, there is no significant similarity with the reference structure. Close to T 0.1 the global 
minimum of the corresponding overlap free energy F{Q) changes discontinuously towards 
larger Q values, and at the transition state the denatured (D) and the folded native (N) 
macrostate are equally probable. The existence of this pronounced transition state is a 
characteristic indication for first-order- like two-state folding. Decreasing the temperature 
further, the native-fold- like conformations {Q > 0.95) dominate and fold smoothly towards 
the Q = 1 reference structure, i.e., the lowest-energy conformation (N) found for sequence 
SI, which is also depicted in Fig. [Uj^a). 

The folding behavior of sequence S2 is significantly different, as Fig. [H^b) shows, and is 
a typical example for a folding event through an intermediate (I) macrostate. The main 
channel (D) bifurcates and a side channel (I) branches off continuously. For smaller energies 
(or lower temperatures), this branching is followed by the formation of a third channel, 
which ends in the native fold (N). The characteristics of folding-through-intermediates is also 
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reflected by the free-energy landscapes. Starting at high temperatures in the pseudophase 
D of denatured conformations {Q ~ 0.76), the intermediary phase I with Q ^ 0.9 is reached 
close to the temperature T ^ 0.05. Decreasing the temperature further below the native- 
folding threshold close to T = 0.01, the hydrophobic-core formation is finished and stable 
native-fold-like conformations N with Q > 0.97 dominate. 

The most extreme behavior of the three exemplified sequences is found for sequence S3, 
where the main channel (D) does not decay in favor of a native-fold channel. In fact, in 
Fig.[T]^c) we observe both, two separate native-fold channels (Mi and M2) and the main chan- 
nel. Above the folding transition (T = 0.2), the typical sequence- independent denatured 
(D) conformations {Q ~ 0.77) dominate. Annealing below the glass-transition threshold, 
several channels form and coexist. The two most prominent channels (to which the lowest- 
energy conformations Mi and M2 belong that we found in the simulations) eventually lead 
for T ^ 0.01 to ensembles of states Mi with Q > 0.97, which are similar to the reference 
structure shown, and conformations M2 with Q ^ 0.75. The lowest-energy conformation 
found in M2 is also shown in Fig. [11(c). It is structurally different but energetically almost 
degenerate compared with the reference structure. It should also be noted that the lowest- 
energy main-channel conformations have only slightly larger energies than the two native 
folds. Thus, the folding of this heteropolymer is accompanied by a very complex, amorphous 
folding characteristics. In fact, the multiple-peaked distribution H^^^caiE, Q) near minimum 
energies is a strong indication for metastability and bears similarities with spin-glass char- 
acteristics. A native fold in the natural sense does not exist, the Q = 1 conformation is only 
a reference structure but the folding towards this structure is not distinguished as it is in 
the folding characteristics of sequences SI and S2. 

We have confirmed our results of the angular overlap analysis for the folding behaviors by 
a corresponding study of the root mean square deviation (rmsd) which is frequently used to 
characterize folding trajectories in free-energy landscapes. The main advantage of using our 
angular overlap parameter is its efficient calculation which leads to a speed-up of computing 
time by a factor of about 10 compared with the efforts required for analysing the folding 



channels based on the rmsd 



3l|. 



To summarize, we have demonstrated in this study that within a minimalistic heteropoly- 
mer frame it is possible to find clear indications for three different folding characteristics 
known from real proteins by analysing macrostates based on an angular overlap parameter. 
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Our primary physical objective is a more comprehensive, quahtative understanding of uni- 
versal aspects of tertiary protein folding, where microscopic details are expected to be of less 
relevance and which are, therefore, averaged out at a mesoscopic scale in a coarse-grained 
model. For selected hydrophobic-polar heteropolymer sequences - not being explicitly de- 
signed for this study - we have shown that characteristic folding behaviors such as two-state 
folding, folding through intermediates, and metastability can be identified which are qual- 
itatively comparable with real folding events in nature. Beyond the general interest in a 
theoretical understanding of the basic mechanisms of protein folding, the preparation of 



synthetic peptide macrostates in 



or pattern-selective polymers 
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uture a pplic ations, e.g., the successful design of substrate- 
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30], is strongly connected with the complex 



aspects of conformational folding transitions as investigated in this study. 
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